Module 2: Inversion

In the previous module we started with a continuous distribution of a physical property and discretized it into many cells, then when performed a forward simulation that created data from known model parameters. Inversion, of course, is exactly the opposite process. Imagine each model parameter that we had represents a layer in a 1D layered earth. At the surface of the earth we measure the data, and when we invert we do so for the model parameters. Our goal is to take the observed data and recover models that emulate the real Earth as closely as possible.

You may have noticed that the act of discretizing our problem created more cells than data values. In our latter example we produced 20 data points from 1000 model parameters, which is only a few data points and many model parameters. While this was not much of a problem in the forward simulation, when we want to do the inverse process, that is, obtain the model parameters from the data, it is clear that we have many more unknowns than knowns. In short, we have an underdetermined problem, and therefore infinite possible solutions. In mathematical terms, geophysical surveys represent what are called "ill-posed" problems.

An "ill-posed" as a problem that is not a "well-posed" problem. A well-posed problem is a problem in mathematics that must satisfy all three of the following criteria:

  1. A solution exists.
  2. The solution is unique.
  3. The solution's behaviors change continuously with continuously changing initial conditions.

Any mathematical formulation that does not satisfy all three of the above is, by definition, an ill-posed problem. Since we are dealing with an underdetermined system, I hope that it is clear that we are dealing with an ill-posed problem (i.e., we have no unique solution), and we are going to have to come up with a method (or methods) that can help us choose from the available solutions. We need to devise an algorithm that can choose the "best" model from the infinitely many that are available to us.

In short, we are going to have to find an optimum model. More specifically, in the context of most geophysics problems, we are going to use gradient-based optimization. This process involves building an objective function, which is a function that casts our inverse problem as an optimization problem. We will build an objective function consisting of two parts:

$$ \phi = \phi_d + \beta \phi_m $$

Where the terms on the right hand side are (1) a data misfit (denoted as $\phi_d$) and (2) a model regularization (denoted as $\phi_m$). These two parts will be elaborated in detail below.

Once we have formulated the model objective function, we will take derivatives and obtain a recovered model. This module will flesh out the details of the model objective function, and then take first and second derivatives to derive an expression that gives us a solution for our model parameters.

The Data Misfit, $\phi_d$

A misfit describes how close synthetic data matches measurements that are made in the field. Traditionally this term refers to the difference between the measured data and the predicted data. If these two quantities are sufficiently close, then we consider the model to be a viable candidate for the solution to our problem. Because the data are inaccurate, a model that reproduces those data exactly is not feasible. A realistic goal, rather, is to find a model whose predicted data are consistent with the errors in the observations, and this requires incorporating knowledge about the noise and uncertainties. The concept of fitting the data means that some estimate of the “noise” be available. Unfortunately “noise” within the context of inversion is everything that cannot be accounted for by a compatible relationship between the model and the data. More specifically, noise refers to (1) noise from data aquisition in the field, (2) uncertainty in source and receiver locations, (3) numerical error, (4) physical assumptions about our model that do not capture all of the physics.

A standard approach is to assume that each datum, $d_i$, contains errors that can be described as Gaussian with a standard deviation $\epsilon_i$. It is important to give a significant amount of thought towards assigning standard deviations in the data, but a reasonable starting point is to assign each $\epsilon_i$ as $\epsilon_i= floor +\%|d_i|$.

Incorporating both the differences between predicted and measured data and a measure of the uncertainties in the data yields our misfit function, $\phi_d$:

$$ \phi_d (m) = \frac{1}{2} \sum_{i=1}^N \left( \frac{F[m] -d_i^{obs} }{\epsilon_i}\right)^2 = \frac{1}{2} \|W_d(F[m] - d^{obs}) \|_2^2 $$

Note that the right hand size of the equation is written as a matrix-vector product, with each $\epsilon_i$ in the denominator placed as elements on a diagonal matrix $W_d$, as follows:

\begin{equation} \begin{split} W_d = \begin{bmatrix} \frac{1}{\epsilon_1} & 0 & 0 & \cdots & 0\\ 0 & \frac{1}{\epsilon_2} & 0 & \cdots & 0\\ 0 & 0 & \frac{1}{\epsilon_3} & \cdots & \vdots\\ 0 & 0 & 0 & \ddots & \frac{1}{\epsilon_M}\\ \end{bmatrix} \end{split} \end{equation}

If we return to linear problem from the previous section where our forward operator was simply a matrix of kernel functions, we can substitute $F[m]$ with $G$ and obtain $$ \phi_d (m) = \frac{1}{2} \sum_{i=1}^N \left( \frac{(Gm)_i -d_i^{obs} }{\epsilon_i}\right)^2 = \frac{1}{2} \|W_d(Gm - d^{obs}) \|_2^2 $$

Now that we have defined a measure of misfit, the next task is to determine a tolerance value, such that if the misfit is about equal to that value, then we have an acceptable fit. Suppose that the standard deviations are known and that errors are Gaussian, then $\phi_d$ becomes a $\chi_N^2$ variable with $N$ degrees of freedom. This is a well-known quantity with an expected value $E[\chi_N^2]=N$ and a standard deviation of $\sqrt{2N}$. Basically, what this means is that computing $\phi_d$ should give us a value that is close to the number of data, $N$.

The Model Regularization, $\phi_m$

There are many options for choosing a model regularization, but the goal in determining a model regularization is the same: given that we have no unique solution, we must make assumptions in order to recast the problem in such a way that a solution exists. A general function used in 1D is as follows:

$$ \phi_m = \alpha_s \int (m)^2 dx + \alpha_x \int \left( \frac{dm}{dx} \right)^2 dx $$

Each term in the above expression is a norm that measures characteristics about our model. The first term is a representation of the square of the Euclidean length for a continuous function, and therefore measures the length of the model, while the second term uses derivative information to measure the model's smoothness. Usually the model regularization is defined with respect to a reference model. In the above, the reference model would simply be zero, but choosing a non-zero reference model $m_{ref}$, yields the following: $$ \phi_m = \alpha_s \int (m-m_{ref})^2 dx + \alpha_x \int \left( \frac{d}{dx} (m-m_{ref}) \right)^2 dx $$

As before, we will discretize this expression. It is easiest to break up each term and treat them separately, at first. We will denote each term of $\phi_m$ as $\phi_s$ and $\phi_x$, respectively. Consider the first term. Translating the integral to a sum yields:

$$ \phi_s = \alpha_s \int (m)^2 dx \rightarrow \sum_{i=1}^N \int_{x_{i-1}}^{x_i} (m_i)^2 dx = \sum_{i=1}^N m_i^2 (x_i - x_{i-1}) $$

Each spatial "cell" is $x_i - x_{i-1}$, which is the distance between nodes, as you may recall from the previous module. To simplify notation, we will use $\Delta x_{n_i}$ to denote the ith distance between nodes:


We can then write $\phi_s$ as:

$$ \phi_s = \alpha_s \sum_{i=1}^N m_i^2 \Delta x_{n_i} = \alpha_s m^T W_s^T W_s m = \alpha_s \|W_s m\|_2^2 $$

with:

\begin{equation} \begin{split} W_d = \begin{bmatrix} \frac{1}{\sqrt{\Delta x_{n_1}}} & 0 & 0 & \cdots & 0\\ 0 & \frac{1}{\sqrt{\Delta x_{n_2}}} & 0 & \cdots & 0\\ 0 & 0 & \frac{1}{\sqrt{\Delta x_{n_3}}} & \cdots & \vdots\\ 0 & 0 & 0 & \ddots & \frac{1}{\sqrt{\Delta x_{n_N}}}\\ \end{bmatrix} \end{split} \end{equation}

For the second term, we will do a similar process. First, we will delineate $\Delta x_{c_i}$ as the distance between cell centers:


A discrete approximation to the integral can be made by evaluating the derivative of the model based on how much it changes between the cell-centers, that is, we will take the average gradient between the $ith$ and $i+1th$ cells:

$$ \phi_x = \alpha_x \int \left( \frac{dm}{dx} \right)^2 dx \rightarrow \sum_{i=1}^{N-1} \left( \frac{m_{i+1}-m_i}{h_k}\right) \Delta x_{c_i} = m^T W_x^T W_x m = \|W_x m\|_2^2 $$

The matrix $W_x$ is a finite difference matrix constructed thus:

\begin{equation} \begin{split} W_d = \begin{bmatrix} -\frac{1}{\sqrt{\Delta x_{c_1}}} & \frac{1}{\sqrt{\Delta x_{c_1}}} & 0 & \cdots & 0\\ 0 & -\frac{1}{\sqrt{\Delta x_{c_2}}} & \frac{1}{\sqrt{\Delta x_{c_2}}} & \cdots & 0\\ 0 & 0 & \ddots & \ddots & \vdots\\ 0 & 0 & 0 & -\frac{1}{\sqrt{\Delta x_{c_{N-1}}}} & \frac{1}{\sqrt{\Delta x_{c_{N-1}}}}\\ 0 & 0 & 0 & 0 & 0 \end{bmatrix} \end{split} \end{equation}

If $W_x$ is a matrix that is written as an $N \times N$ matrix, then the last row is zero. The reason the last row is zero is because there are $N-1$ segments on which linear gradients have been defined. Effectively the two $1/2$ cells on each end are neglected.

So to summarize, we have $\phi_m = \phi_s + \phi_x$ with

\begin{equation} \begin{split} \phi_m & = \phi_s + \phi_x \\[0.4em] & = \alpha_s \|W_s m\|_2^2 + \alpha_x \|W_x m\|_2^2 \\[0.4em] \end{split} \end{equation}

Next, we will write this more compactly by stacking $W_s$ and $W_x$ into a matrix $W_m$ as follows

\begin{equation} \begin{split} W_m = \begin{bmatrix} \alpha_s W_s\\ \alpha_x W_x \end{bmatrix} \end{split} \end{equation}

Now we can write $\phi_m$ as the 2-norm of one matrix-vector operation:

$$ \phi_m = \| W_m m \|_2^2 $$

As before, if we want to describe with respect to a reference model $m_{ref}$ we could write:

$$ \phi_m = \| W_m (m-m_{ref})\|_2^2 $$

Model Objective Function

If we go back and recall what was discussed in the introduction, the model objective function casts the inverse problem as an optimization problem, and as mentioned, we will be using gradient-based optimization, so we will need to take derivatives. The complete model objective function that we are dealing with will contain both the data misfit and the model regularization. This means that we can write it as $\phi$ as the sum of the two and then differentiate:

$$ \phi = \phi_d + \beta \phi_m $$

For the linear problem we are considering $$ \phi_d = \frac{1}{2}\| W_d (Gm-d^{obs})\|_2^2 = \frac{1}{2}(Gm-d^{obs})^T W_d^T W_d (Gm-d^{obs}) $$ and $$ \phi_m = \frac{1}{2} \|W_m (m-m_{ref}) \|^2_2 = \frac{1}{2}(m-m_{ref})^T W_m^T W_m (m-m_{ref}) $$

To simplify the terms and see the math a little more clearly, let's note that $W_d(Gm-d^{obs})$, and $\beta W_m(m-m_{ref})$ are simply vectors. And since we are taking the square of the 2-norm, all that we are really doing is taking the dot product of each vector with itself. So let $z=W_d(Gm-d^{obs})$, and let $y=W_m(m-m_{ref})$ where both $z$ and $y$ vectors are functions of $m$. So then:

$$ \phi_d = \frac{1}{2}\|z\|_2^2 = \frac{1}{2}z^T z $$


$$ \phi_m = \frac{1}{2}\|y\|_2^2 =\frac{1}{2}y^T y $$

To minimize this, we want to look at $\nabla \phi$. Using our compact expressions: $$ \phi = \phi_d + \beta \phi_m = \frac{1}{2}z^Tz + \beta \frac{1}{2}y^Ty \\ $$

Taking the derivative with respect to $m$ yields: \begin{equation} \begin{split} \frac{d \phi}{dm}& = \frac{1}{2} \left(z^T \frac{dz}{dm} + z^T \frac{dz}{dm} + \beta y^T \frac{dy}{dm} + \beta y^T \frac{dy}{dm}\right)\\\\[0.6em] & = z^T \frac{dz}{dm} + \beta y^T \frac{dy}{dm} \end{split} \end{equation}

Note that $$\frac{dz}{dm} = \frac{d}{dm}(W_d(Gm-d^{obs})) = W_d G $$

and

$$ \frac{dy}{dm} = \frac{d}{dm}(W_m (m-m_{ref})) = W_m $$

Next, let's substitute both derivatives, our expressions for $z$ and $y$, apply the transposes, and rearrange:

\begin{equation} \begin{split} \frac{d \phi}{dm} & = z^T \frac{dz}{dm} + \beta y^T \frac{dy}{dm} \\[0.6em] & = (W_d(Gm-d^{obs}))^T W_d G + \beta (W_m (m-m_{ref}))^T W_m\\[0.6em] & = (Gm-d^{obs})^T W_d^T W_d G + \beta (m-m_{ref})^T W_m^T W_m \\[0.6em] & = ((Gm)^T - d^T) W_d^T W_d G + \beta (m^T-m_{ref}^T)W_m^T W_m \\[0.6em] & = (m^T G^T - d^T) W_d^T W_d G + \beta m^T W_m^T W_m - \beta m_{ref}^T W_m^T W_m \\[0.6em] & = m^T G^T W_d^T W_d G - d^T W_d^T W_d G + \beta m^T W_m^T W_m - \beta m_{ref}^T W_m^T W_m\\[0.6em] & = m^T G^T W_d^T W_d G + \beta m^T W_m^T W_m - d^T W_d^T W_d G - \beta m_{ref}^T W_m^T W_m \end{split} \end{equation}

Now we have an expression for the derivative of our equation that we can work with. Setting the gradient to zero and gathering like terms gives:

\begin{equation} \begin{split} m^T G^T W_d^T W_d G + \beta m^T W_m^T W_m = d^T W_d^T W_d G + \beta m_{ref}^T W_m^T W_m\\[0.6em] (G^T W_d^T W_d G + \beta W_m^T W_m)m = G^T W_d^T W_d d + \beta W_m^T W_m m_{ref}\\[0.6em] \end{split} \end{equation}

From here we can do two things. First, we can solve for $m$, our recovered model:

\begin{equation} \begin{split} m = (G^T W_d^T W_d G + \beta W_m^T W_m)^{-1} (G^T W_d^T W_d d + \beta W_m^T W_m m_{ref})\\[0.6em] \end{split} \end{equation}

Second, we can get the second derivative simply from the bracketed terms in the left hand side of the equation above: \begin{equation} \frac{d^2 \phi}{dm^2} = G^T W_d^T W_d G + \beta W_m^T W_m \end{equation}

In the model problem that we are solving, second derivative information is not required to obtain a solution, however, in non-linear problems or situations when higher order information is required, it is useful to have this available when we need it.

Solving for $m$ in Python

Before we solve for $m$, we will recreate what we had in the first module. First, install the appropriate packages:


In [1]:
# Import Packages
from SimPEG import *
from IPython.html.widgets import interactive
import numpy as np
%pylab inline


Efficiency Warning: Interpolation will be slow, use setup.py!

            python setup.py build_ext --inplace
    
Populating the interactive namespace from numpy and matplotlib
WARNING: pylab import has clobbered these variables: ['interactive']
`%matplotlib` prevents importing * from pylab and numpy

Here is the model that we had previously:


In [8]:
# Begin by creating a ficticious set of model data

M=1000                      # Set number of model parameters 
x=np.linspace(0,1,M+1)      # Define 1D domain on nodes
xc = 0.5*(x[1:] + x[0:-1])  # Define 1D domain on cell centers
m = np.zeros(M)             # preallocate m array 

# Define Gaussian function:
gauss = lambda x, a, mean, SD: a*np.exp(-(x-mean)**2./(2.*SD**2.)) # create a Gaussian function

# Choose parameters for Gaussian, evaluate, and store in an array, f.
SD=6                           
mean=0
a=1/(SD*sqrt(2*pi))
x2=np.linspace(-20,20,0.2*M)  
f = gauss(x2,a, mean,SD)

# Define a boxcar function:
box = 0.4*np.ones(0.2*M)

# Insert the Gaussian and Boxcar into m:
m[0.2*M:0.4*M]=box
m[0.6*M:0.8*M]=10*f
   
# Plot      
pylab.plot(xc,m)
pylab.xlabel('x')
pylab.ylabel('m(x)')
pylab.title('Model, $m(x)$')


Out[8]:
<matplotlib.text.Text at 0xad644e0>

Again, we define out kernel functions, averaging and volume matrices as before:


In [9]:
# Make the set of kernel functions

g = lambda x, i, p, q: np.exp(-p*i*x)*np.cos(2*np.pi*q*i*x) # create an anonymous function as immediately above
p = 0.01      # Set values for p, q
q = 0.15
N = 20        # specify number of output data
Gn = np.zeros((M+1,N))

for i in range(N):
    f = g(x,i,p,q)
    Gn[:,i] = f
    
# Plot  
pylab.plot(x,Gn)
pylab.xlabel('x')
pylab.ylabel('g(x)')
pylab.title('Kernel functions, $g(x)$')

# Make Averaging Matrix
n=M+1            # Define n as the N+1 dimension of the matrix   
w=n-1            # Define w and the n-1  dimentsion of the matix   
s = (M,n)        # Store matrix values 
Av = np.zeros(s) # Create a matrix of zeros of the correct dimensions 
                 # and fill in with elements usin the loop below (note the 1/2 is included in here). 
for i in range(M):
    j=i
    k=i+1
    Av[i,j] = 0.5  
    Av[i,k] = 0.5


# make the Volume, "delta x" array
Deltax = diff(x)
V = np.diag(Deltax)     # create diagonal matrix


Last, we produced our data:


In [11]:
G=np.dot(np.transpose(np.dot(Av, Gn)), V)
d = np.dot(np.dot(np.transpose(np.dot(Av, Gn)), V),m)   # Create synthetic data
xd=np.linspace(0,1,len(d))                              # make x array for data 

# Plot

pylab.plot(xd,d)
pylab.xlabel('')
pylab.ylabel('d')
pylab.title('Synthetic Data $d$')


Out[11]:
<matplotlib.text.Text at 0xad66e48>

Introducing noise to the data

This is where we stood at the end of the last module. Next, to simulate taking data in the field, we are going to add a noise to the data before we perform our inversion. We will do this by defining a lambda function that assigns a floor value and percent scaling factor. Also, we will assume that the noise is Gaussian. We then add the noise $r$ to the original data to make a simulated vector of observed data. The superposition of our noise and original data is plotted below.


In [12]:
# Add noise to our synthetic data

noise = lambda fl, length, data, s: fl + randn(length)*data*s # introduce noise using a floor (f), length (l) and scaling factor (s)

r = noise(0, len(d), d, 0.04)

dobs= d + r

#pylab.plot(xd,r)
pylab.plot(xd,d)
pylab.plot(xd,dobs)
pylab.xlabel('')
pylab.ylabel('d')
pylab.title('Synthetic Data and noise, $d$')


Out[12]:
<matplotlib.text.Text at 0xad50ef0>

Calculating $\phi_d$

We are now in a position to build up the data misfit term, $\phi_d$. We will need a function to compute the 2-norm, so constructing a lambda function to do this is useful. Next we will make the matrix $W_d$, which is a diagonal matrix that contains the inverses of the variances in the uncertainty in our data. Again, we will define a floor and percent error for each datum. Assigning a floor to the uncertainties in our case will consist of taking the Euclidian length of our data vector and scaling it by the number of data values that we have. Last, we calculate $phi_d$ using our 2-norm function that we created. It is insightful to see what values have been assigned to our floor and misfit, so they are printed below.


In [13]:
# Calculate the data misfit, phi_d #
####################################

# Anonymous function for 2-Norm
L2 = lambda A, w: dot(dot(w.T,A.T),dot(A,w))  

#This constructs the inverses of the variances. 
invvar = lambda floor, percent, data: 1./(floor + percent*np.abs(data))
                            
# assign a floor
flr = 0.015*dot(d.T,d)**0.5/len(d)

# Make Wd
eps = invvar(flr,0.02,dobs) # define epsilon and Wd
Wd=np.diag(eps) 

# Take the 2-norm
phi_d = L2(Wd, np.dot(G,m)-dobs)
print phi_d
print flr


28.8794257836
0.000200320925096

Choosing a reference model

In choosing a reference model, we need to know something about the variation of physical properties in the subsurface. Given that we are looking for property contrasts relative to a background property, and assuming that we know nothing about the number of targets, their values, or their locations, a good first place to start is to assign a constant background value. In real-world situations, this would represent the background value of the surrounding earth (for example, the conductivity, seismic velocity, density, etc. of our country rock). In the case of our synthetic model, we are going to take the mean value of our model and use that as a constant background.


In [14]:
# Reference Model#
##################
## A constant reference model ##
mref = np.mean(m)*np.ones(M)   

# Plot      
pylab.plot(xc,mref)
pylab.xlabel('x')
pylab.ylabel('m(x)')
pylab.title('Reference Model, $m(x)$')


Out[14]:
<matplotlib.text.Text at 0xae97a20>

Setting up the spacial domains

Here we are going to set up the domains as before, with a vector for points on the cell centers and for points on the nodes. These will be defined as $l$ and $h$.


In [18]:
# Domains #
###############################                       # Set up domains:
l = np.power(x[1:len(x)]-x[0:len(x)-1],0.5)           # vector of distances between nodes, l
midx=np.dot(Av,x)                                     # vector of midpoints, midx 
h = np.power(midx[1:len(midx)]-x[0:len(midx)-1], 0.5) # Calculate distances between midpoints & take sqr roots of each value in vector h

Calculating the model regularization, $\phi_m$

As discussed above, we are going to first need to make our $W_m$ matrix, which is a partitions matrix from two other matrices, $W_s$ and $W_x$, each scaled by a separate parameter $\alpha_s$ and $\alpha_x$. We are going to discussed the manner in which $\alpha_s$ and $\alpha_x$ are selected in more detail during the next module. But for the moment, we will set then as they are defined below. Once this matrix is built up, calculating $\phi_m$ is a simple matter, given that we have made a function to compute the 2-norm already. For the sake of illustration, I compute and print $\phi_m$ from the residual of our reference model $m_{ref}$ and our true model $m$. However, of interest to us will be the residual of the model that we recover $m_{rec}$ and our reference.


In [19]:
#  Calculate the model regularization, phi_m #
##############################################

# Create Ws, Wx matriceshes
Ws=np.diag(l)                                  # put length vector into a diagonal matrix, Ws
Wx = np.zeros((len(m), len(m)))                # preaollocate array and enter values into matrix, Wx
for i in range(len(h)):
    j=i
    k=i+1
    Wx[i,j] = h[i]  
    Wx[i,k] = h[i]  
    
alpha_s=0.1                                    # Set alpha_s and alpha_x values  
alpha_x=0.15

# build Wm #
Wm=np.concatenate((alpha_s*Ws, alpha_x*Wx), axis=0)

phi_m = L2(Wm, mref-m)
print (phi_m)


0.00557850734497

Inverting for our recovered model

At last we can invert to find our recovered model and see how it compares with the true model. First we will assign a value for $\beta$. As with the $\alpha$ parameters from before, we will assign a value, but the choice of beta will be a topic that we explore more fully in the next module. Once our $\beta$ value is assigned, we will define yet another lambda function to obtain the recovered model, plot it against our true model, and then output our results for $\phi_d$ and $\phi_m$.


In [20]:
beta = 1000000
# Set beta value

# Get recovered model
mrecovered = lambda G,Wd,Wm,data,mref, beta: dot(np.linalg.inv(dot(dot(G.T,Wd.T),dot(Wd,G)) + beta*dot(Wm.T,Wm)),dot(dot(G.T,Wd),dot(Wd,dobs)) + beta*dot(dot(Wm.T,Wm),mref))
# mrec = dot(np.linalg.inv(dot(dot(G.T,Wd.T),dot(Wd,G)) + beta*dot(Wm.T,Wm)),dot(dot(G.T,Wd),dot(Wd,dobs)) + beta*dot(dot(Wm.T,Wm),mref))

mrec = mrecovered(G,Wd,Wm,dobs,mref,beta)

# Get residual of mref and mrec
phi_d2 = L2(Wd,np.dot(G,mrec)-dobs)
phi_m2 = L2(Wm, mref-mrec)


print phi_d2
print phi_m2

# Plot      
pylab.plot(xc,m)
pylab.plot(xc,mrec)
pylab.xlabel('x')
pylab.ylabel('m(x)')
pylab.title('Model, $m(x)$')


600.74350741
0.00271419323233
Out[20]:
<matplotlib.text.Text at 0xb0bffd0>

As a last step, we will obtain our predicted data and see how it compares with the synthetic data that we used initially.


In [21]:
# Get predicted data

dpred = np.dot(G,mrec)

#pylab.plot(xd,r)
pylab.plot(xd,dpred,'o')
pylab.plot(xd,dobs)
pylab.plot(xd,d,'x')
pylab.xlabel('')
pylab.ylabel('d')
pylab.title('Predicted and Observed Data')


Out[21]:
<matplotlib.text.Text at 0xdeaf908>

This concludes the current module. For the next module, we will examine the constraints on our choice for $\alpha_s$ and $\alpha_x$, and then introduce the Tikhonov curve and a method for choosing $\beta$.